Show the code
pacman::p_load(sf, spdep, tmap, tidyverse, patchwork)Amelia Chua
Water is an essential resource that not only supports life but also drives economic development. According to the World Bank, approximately 2 billion people in the world do not have safely managed drinking water services and 3.6 billion people lack safely managed sanitation services1. Developing countries are most affected by the shortage of water. The lack of ground water threatens their fight against poverty, food and water security and socio-economic development2.
Organisations like the World Bank, UNICEF and Water Point Data Exchange (WPdx) have various plans and schemes in place to combat this issue. In particular, WPdx has it in their mission to unlock potential of water point data to improve rural water services through evidence-based decision-making. It maintains a global data repository for data collected from rural areas at the water point or small water scheme level. A notable point is that data is formatted according to the WPdx Data Standard before being uploaded and published onto the repository. Using these information, decision support tools linked to the repository will be able to perform advanced analyses seamlessly3.
Geospatial analytics hold tremendous potential to address complex societal problems like water shortage. In this study, I will apply appropriate global and local measures of spatial association techniques to reveal the spatial patterns of functional and non-functional water points. This will include will not limited to the following tasks:
Using appropriate sf method, import the shape file into R and save it in a simple feature data frame format. Note that there are three Projected Coordinate Systems of Nigeria, they are: EPSG: 26391, 26392, and 26303. You can use any one of them.
Using appropriate tidyr and dplyr methods, derive the proportion of functional and non-functional water point at LGA level.
Combining the geospatial and aspatial data frame into simple feature data frame.
Performing outliers/clusters analysis by using appropriate local measures of spatial association methods.
Performing hot spot area analysis by using appropriate local measures of spatial association methods.
The focus of this study would be Nigeria. Nigeria is located in West Africa and is the most populous country in Africa. The states are grouped into six geopolitical zones, the North Central, North East, North West, South West, South East and South4. UNICEF estimates that one third of Nigeria children do not have sufficient water to meet daily needs5.
The code chunk below installs and loads sf, spdep, tmap, tidyverse, patchwork packages into R environment. pacman() is a R package management tool.
As mentioned in the earlier section, the focus of this study is Nigeria. Two data sets will be used in this study. They are:
Nigeria Level-2 Administrative Boundary (also known as Local Government Area or LGA) polygon feature GIS data. The data was obtained from geoBoundaries.
WPdx+ data set that was obtained from Water Point Data Exchange (WPdx). It consists of water point related data from rural areas at the water point or small water scheme level. The entire set of data includes countries other than Nigeria. Hence, we will be performing data pre-processing to extract the relevant data.
The raw WPdx+ data file is 427mb and exceeds the upload limit of Github. In the next section, we will extract the relevant and necessary information, extract it into a .rds file and use the file for subsequent analysis. The raw file will not be pushed to Github to avoid crashing the Github repository.
The geospatial data is in ESRI shape file format and the attribute table is in csv format.
The code chunk below uses st_read() function of sf package to import geoBoundaries-NGA-ADM2 shape file into R as a polygon feature data frame. The imported shape file will be a simple features object of sf.
Reading layer `geoBoundaries-NGA-ADM2' from data source
`C:\ameliachuayt\ISSS624\Exercises\Take-home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
From the output, we can see that there are 774 multipolygons features with 5 fields. nigeria is in WGS 84 coordinates system. The bounding box provides the x extend and y extend of the data.
To learn more about the attribute information, we can apply glimpse() of dplyr package.
Rows: 774
Columns: 6
$ shapeName <chr> "Aba North", "Aba South", "Abadam", "Abaji", "Abak", "Abaka…
$ Level <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ shapeID <chr> "NGA-ADM2-72505758B79815894", "NGA-ADM2-72505758B67905963",…
$ shapeGroup <chr> "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NGA", "NG…
$ shapeType <chr> "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "ADM2", "AD…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((7.401109 5...., MULTIPOLYGON (…
The printout above details the data type of each field. For instance, $ shapeName is in character data type.
The WPdx+ data set has 70 columns and 406,566 rows.
At any point in time when we wish to see the columns of a dataframe, we can use glimpse(). glimpse() allows us to see all columns and their data type in the data frame which is very helpful since we have 70 columns.
For ease of referencing, let’s rename shapeName to LGA.
Using duplicated() from R base functions, we can seek out LGA names that might be duplicated. This step is important for later part of the analysis.
Based on the above, we have 6 LGAs that have the same name. A desk-based research using the coordinates showed that there are two scenarios that led to this duplication:
Let’s correct these errors:
nigeria$LGA[94] <- "Bassa_Kogi"
nigeria$LGA[95] <- "Bassa_Plateau"
nigeria$LGA[304] <- "Ifelodun, Kwara"
nigeria$LGA[305] <- "Ifelodun, Osun"
nigeria$LGA[355] <- "Irepodun, Kwara"
nigeria$LGA[356] <- "Irepodun, Osun"
nigeria$LGA[519] <- "Nassarawa, Kano"
nigeria$LGA[546] <- "Obi, Benue"
nigeria$LGA[547] <- "Obi, Nasarawa"
nigeria$LGA[693] <- "Surulere, Lagos"
nigeria$LGA[694] <- "Surulere, Oyo"Next, let’s double check that the corrections are made.
The entire data set is large, and we need only to extract the relevant information required for the analysis. The focus of the study is Nigeria and the analysis will be done at the Level-2 Administrative Boundary (or LGA) level. Therefore, we will be performing steps to:
Besides the above, we will also be performing data preparation and wrangling techniques to surface data issues and resolve them prior to the analysis. Before we start our data wrangling, it would be useful to inspect the metadata to understand what each column represents.
To learn which column(s) to use to filter for Nigeria’s data, we can inspect the metadata. We will not display the entire metadata here as it is lengthy. However, here is an excerpt of some columns:
| Column Name | Description |
|---|---|
| #clean_country_name | Cleaned version of the country name based on provided GPS coordinates. |
| #clean_adm1 | Cleaned version of the Primary Administrative Division data based on provided GPS coordinates and GADM boundaries. |
| #clean_adm2 | Cleaned version of the Secondary Administrative Division data based on provided GPS coordinates and GADM boundaries. |
| #status_id | Identify if any water is available on the day of the visit, recognizing that it may be a limited flow. |
| #status_clean | Categorized version of the #status parameter. Based on terms from the #status entry, status_clean includes 5 categories: Fully functional, Functional but needs repair, Non functional and needs repair, Non functional due to dry season, Abandoned and Other. These categories will continue to evolve and will be refined in future updates. |
| #status | Status of the physical/mechanical condition of the water point. |
Based on the above, we can use #clean_country_name to filter out rows belonging to Nigeria. This can be done using the code chunk below. The filtered data set will be saved as wpdx_nigeria. We can also inspect the first few rows of the data by using head().
Let’s use dim() to reveal the dimensions of the wpdx_nigeria.
The output of the above code would reveal that wpdx_nigeria has 95,008 rows and 70 columns.
#status_clean provides us the status of the water points. Using the code chunk below, we use count() dplyr package to count the frequency of the each location and/or category.
| #status_clean | n |
|---|---|
| Abandoned | 175 |
| Abandoned/Decommissioned | 234 |
| Functional | 45,883 |
| Functional but needs repair | 4,579 |
| Functional but not in use | 1,686 |
| Non-Functional | 29,385 |
| Non-Functional due to dry season | 2,403 |
| Non Functional due to dry season | 7 |
| NA | 10,656 |
From the output above, we observe two issues for the #status_clean column: misspellings, missing data and too many categories. Let’s tackle misspellings first.
We can easily see that there two similar categories “Non functional due to dry season” = “Non-Functional due to dry season”. One is spelled with a dash and one without. Let’s correct this by using the recode() from dplyr package.
We can confirm that the categories has been recoded correctly by running count() again.
We also note that there are 10,656 missing values in the #status_clean column. Let’s rename all the NA as ‘Unknown’.
In our study, we would like know the functional and non-functional water points. Therefore, we can actually aggregate our data into three categories: functional, non-functional and unknown as follows:
| Old Category | New Category |
|---|---|
| Abandoned | Non-Functional |
| Abandoned/Decommissioned | Non-Functional |
| Functional | Functional (no change) |
| Functional but needs repair | Functional |
| Functional but not in use | Functional |
| Non-Functional | Non-Functional (no change) |
| Non-Functional due to dry season | Non-Functional |
| Unknown | Unknown (no change) |
We will create a new column that states whether the water point if functional or not using the code chunk below.
#recode
wpdx_nigeria_clean <- wpdx_nigeria_clean %>%
mutate(`Functional_Status` = `#status_clean`) %>%
mutate(`Functional_Status` = recode(`Functional_Status`,
"Abandoned" = "Non-Functional",
"Abandoned/Decommissioned" = "Non-Functional",
"Functional but needs repair" = "Functional",
"Functional but not in use" = "Functional",
"Non-Functional due to dry season" = "Non-Functional"))Again, we can re-run the frequency count to confirm the recode has been performed.
Let us rename #clean_adm2 to LGA.
wpdx_nigeria_simple <- subset(wpdx_nigeria_clean ,
select = c("LGA", "#lat_deg",
"#lon_deg","Functional_Status"))
#"#water_source_clean",
# "#water_source_category", "#distance_to_primary_road",
# "#distance_to_secondary_road", "#distance_to_tertiary_road",
# "#distance_to_city", "#distance_to_town", "usage_capacity",
# "is_urban", "cluster_size", Next, we will create a simple feature data frame from wpdx_nigeria_simple. This is done using the code chunk below.
We can examine the content of this newly created simple feature data frame using the following code chunk.
From the output, we see that a new column called geometry has been added and the original #lon_deg and #lat_deg columns have been removed.
We can also use st_geometry() to retrieve the geometry list-column as shown in the code chunk below.
The output displays basic information of the feature class such as type of geometry, the geographic extent of the features and the coordinate system of the data. What we can see is that wpdx_nigeria_sf is in the WGS 84 coordinate system.
Next, we can count the number of water points in each LGA using the following code chunk. Two operations are happening at the same time. First, the code chunk identifies water points located inside each LGA by using st_intersects(). Next, lengths() of Base R is used to calculate the number of water points that fall inside each LGA.
nigeria$wpt_functional <- lengths(st_intersects(nigeria, functional))
nigeria$wpt_nonfunctional <- lengths(st_intersects(nigeria, non_functional))
nigeria$wpt_unknown <-lengths(st_intersects(nigeria, unknown))
nigeria$wpt_total <- lengths(st_intersects(nigeria, functional)) +
lengths(st_intersects(nigeria, non_functional)) +
lengths(st_intersects(nigeria, unknown))We will transform nigeria from geographic coordinate system to projected coordinate system. We need to do this transformation because the geographic coordinate system is inappropriate if the analysis require the use of distance and/or area measurements. This would be at the later stage where we compute distance-based contiguity weight matrices.
There are three Projected Coordinate Systems of Nigeria: EPSG: 26391, 26392, and 26303. For this study, we will be EPSG 26391. We can use the st_transform() of the sf package to re-project nigeria from one coordinate system to another coordinate system mathematically.
We will save a copy of the nigeria in the geographic coordinate system before the projection.
Next, let us display the content of nigeria sf data frame as shown below.
Notice that it is in projected coordinate system now. Furthermore, if you refer to Bounding box:, the values are greater than 0-360 range of decimal degree commonly used by most of the geographic coordinate systems.
Next, let’s save our cleaned data into .rds data format files using the write_rds() of readr package. The output file is called wp_nga.rds and it is saved in rds sub-folder. We do this to shorten the loading time and more importantly, we can avoid uploading the large raw files onto GitHub.
Note: WPdx+ also offers the data in shape file format. We can also download that version for our analysis. This is covered here, if you wish to learn more.
Let’s load the .rds file using the readRDS() function.
To avoid skewing the subsequent analysis, we will exclude LDAs without any water points (functional, non-functional and unknown).
There are 13 LGAs without any water points. The number of LGAs decreased from 774 to 761.
While our interest is in geographical distribution of functional and non-functional water points, it would be interesting to see the water point density too. To do this, we must first compute the area of LGA and then the water point density.
The code chunk below uses st_area() of sf package to derive the area of each LGA. We are creating a new column Area to store the area values.
ggplot(data=nigeria,
aes(x=as.numeric(`wpt_func_density`))) +
geom_histogram(bins=20,
color="black",
fill = "pink") +
labs(title = "Are functional water points evenly distributed in Nigeria?",
subtitle = "While there are many LGAs less than one waterpoint per km sq, there is 1 LGA with 10 \nfunctional water points per km sq.",
x = "Functional Water Point density (per km sq)",
y = "Frequency")
We can repeat the same for non-functional water points.
nigeria <- nigeria %>%
mutate(`wpt_nonfunc_density` = (`wpt_nonfunctional` / Area * 1000000))
ggplot(data=nigeria,
aes(x=as.numeric(`wpt_nonfunc_density`))) +
geom_histogram(bins=20,
color="black",
fill = "orange") +
labs(title = "Are non-functional water points evenly distributed in Nigeria?",
subtitle = "It would be good to know that most areas have less than one non-functional water point \nper km sq.",
x = "Non-Functional Water Point density (per km sq)",
y = "Frequency")
There are 94,979 water points in Nigeria. This can be calculated in the code chunk below.
Let’s find out the distribution of the various functional status of water points in Nigeria and use ggplot2 to visualise.

As we can see in the figure above, more than half of the water points are functional, slightly more than a third are non-functional with the remaining unknown.
In the code chuck below, we will use tmap to plot the spatial distribution of water points of different categories. We will use tmap_arrange() to show the plots together.
Before this, we will create a helper function that will help us to plot the choropleths with ease.
# input: the dataframe and the variable name, chart style, title
choropleth_plot <- function(df, varname, style, title) {
tm_shape(df) +
tm_fill(varname,
n= 5,
style = style) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = title,
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)+
tm_compass(position = c('left','bottom'))
}Here, we will plot the distributions of all, unknown, functional and non-functional water points.
tmap_mode("plot")
tmap_arrange(
choropleth_plot(nigeria, "wpt_total", "pretty", "All water points in Nigeria: \npartitioned by 'pretty' intervals"),
choropleth_plot(nigeria, "wpt_unknown", "pretty", "Status unknown water points in Nigeria: \n partitioned by 'pretty' intervals"),
choropleth_plot(nigeria, "wpt_functional", "pretty", "Functional water points in Nigeria: \n partitioned by 'pretty' intervals"),
choropleth_plot(nigeria, "wpt_nonfunctional", "pretty", "Non-functional water points in Nigeria: \n partitioned by 'pretty' intervals"))
The choropleths above are partitioned using the default or ‘pretty’ intervals. We can observe that the number of water points (total) are not evenly spread across evenly–we see mostly lighter shades of orange and certain areas with darker orange. Water points with status unknown are more common in the southern part of the map as compared to the northern parts of Nigeria. In terms of functional water points, we can see areas in the top middle of the map are in darker shades of orange which indicates a higher number of functional water points compared to other areas. For non-functional water points, we see that the outer regions of the maps have lesser non-functional water points. Moving towards the center of the map, we see more non-functional water points. Note that this may be because of the total of water points in the area. For instance, if there is a small number (e.g., <10) of water points in an area and the worse case scenario is that all are non-functional. Compared to an area with >50 water points, having 10 non-functioning water points is only 20% of its supply source.
Let’s try to plot the same charts, this time, using the quantile partitioning method.
tmap_mode("plot")
tmap_arrange(
choropleth_plot(nigeria, "wpt_total", "quantile", "All water points in Nigeria: \npartitioned by 'quantile' intervals"),
choropleth_plot(nigeria, "wpt_unknown", "quantile", "Status unknown water points in Nigeria: \n partitioned by 'quantile' intervals"),
choropleth_plot(nigeria, "wpt_functional", "quantile", "Functional water points in Nigeria: \n partitioned by 'quantile' intervals"),
choropleth_plot(nigeria, "wpt_nonfunctional", "quantile", "Non-functional water points in Nigeria: \n partitioned by 'quantile' intervals"))
The quantile partitioning method creates intervals with an equal number of features i.e., polygons. We can make different observations using this map. When looking at total water points, we can see that north eastern areas and south western areas have relatively less water points as compared to the other areas. This pattern is also seen in the functional water points and non-functional water points choropleth maps. We can see that the northern parts of Nigeria have relatively higher number of functional water points, and relatively lesser non-functional water points as compared to other areas. As we can see in the above, plotting the graphs in “quantiles” gives us a better sense of relative levels of water points as compared to the “pretty” method.
We can also plot our choropleths using its proportions.
#Repeat plot with the prportions
tmap_mode("plot")
tmap_arrange(
choropleth_plot(nigeria, "pct_functional", "pretty", "Proportion of Functional water points \nin Nigeria: partitioned by 'pretty' intervals"),
choropleth_plot(nigeria, "pct_nonfunctional", "pretty", "Proportion of Non-functional water points \nin Nigeria: partitioned by 'pretty' intervals"))
The above two maps are plotted in the same scale (refer to the legend) which makes it easy for comparison between the two. The top 1/3 of the maps seem to have a higher proportion of functional water points than non-functional. The bottom 1/3 of the maps seem to have a slightly higher proportional of non-functional water points than functional ones.
We can also see that there is somewhat an inverse relationship when the proportion is >0.60. This means that the darker areas in one map would likely show up as light areas in the other. For instance, the top of the functional water points maps is in darker shades of orange while the same areas are in lighter shades in the non-functional water points map. We say that it is ‘likely’ the case because there are areas with status unknown water points as well.
Exploratory Spatial Data Analysis or ESDA is an extension of Exploratory Data Analysis. It consists of descriptive techniques to discover spatial distribution of data and identify outliers6.
In this section, we will cover global and local spatial autocorrelation. The former focuses on the overall trend while the latter enables us to identify hot and cold spots in the data.
According to Tobler’s First Law of Geography, “everything is related to everything else, but near things are more related than distant things.”
This sub-section will cover the computation of global spatial autocorrelation statistics and spatial complete randomness test for global spatial autocorrelation. The goal of these analyses is to understand whether water points are evenly distributed across Nigeria.
To compute global spatial autocorrelation, we first need to construct spatial weights of the Nigeria. Spatial relationships are multi-directional and multi-lateral (Ref: Compulsory Reading). We can use Spatial Weights to define spatial neighbourhood for subsequent spatial analysis. There are two commonly used methods of spatial weights: contiguity-based and distanced-based.
In contiguity-based, neighbours share a common boundary which is considered differently in different methods. In Rook contiguity, neighbours have a common edge. In Queen contiguity, neighbours need to only share a common vertex or edge. So in comparison, Queen contiguity is more encompassing as compared to Rook contiguity7.
The differences between the two is illustrated in the picture below.
In distance-based contiguity, we have fixed weighting and adaptive weighting schemes. The former considers two regions are neighbours if they are within a specified distance from one another. In the latter scheme, each region will have the same number of neighbours. The number of neighbour is specified beforehand. If k = 8 neighbours, it classifies the nearest 8 regions as neighbours.
Which method of spatial weights method to use depends on the geographical location we are working with. If the geographical location consists of many isolated islands, then contiguity-based matrix may yield many regions with no neighbours. If the sizes of the features (polygons) are wide ranging where you have very large features and relatively smaller features, then contiguity-based matrix may also result in larger features having many more neighbours which may skew the results–there would be a smoothing effect to the larger number of neighbours.
In the code chunk below, poly2nb() of spdep package is used to compute contiguity weight matrices for the study area. This function builds a neighbours list based on regions with contiguous boundaries. By default, Queen contiguity is applied.
Neighbour list object:
Number of regions: 761
Number of nonzero links: 4348
Percentage nonzero weights: 0.750793
Average number of links: 5.713535
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 14
4 16 57 122 177 137 121 71 39 11 4 1 1
4 least connected regions:
136 497 513 547 with 1 link
1 most connected region:
496 with 14 links
The summary report above shows that there are 761 area units in Nigeria. The most connected region has 14 links. There are four least connected regions with only one neighbour.
In the code chunk below, poly2nb() of spdep package is used to compute contiguity weight matrices for the study area. We specify queen = FALSE to compute Rook contiguity.
Neighbour list object:
Number of regions: 761
Number of nonzero links: 4328
Percentage nonzero weights: 0.7473395
Average number of links: 5.687254
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 14
4 16 59 124 176 138 123 65 40 10 4 1 1
4 least connected regions:
136 497 513 547 with 1 link
1 most connected region:
496 with 14 links
The summary report above shows that there are 751 area units in Nigeria. The most connected region has 14 links. There are four least connected regions with only one neighbour.
A summary of the results are below. In our case, both methods yield similar results.
| Queen Contiguity | Rook Contiguity | |
|---|---|---|
| No. of regions with no links | 0 | 0 |
| Average number of links | 5.71 | 5.69 |
We will derive the distance-based weight matrices by using dnearneigh() of spdep package. The function identifies neighbours of region points by Euclidean distance with a distance band with lower and upper bounds controlled by the bounds argument or by Great Circle distance in kilometres if longlat argument is set to TRUE
Determining the cut-off distance
To ensure that each region has at least one neighbour, we need to find out the minimum distance within which all regions have at least one neighbour. We can do this by following these steps:
Getting the coordinates of polygon centroids. This is required as an input in the next step.
We need to associate each polygon with a point and its coordinates need to be in a separate data frame. We will use a mapping function that applies a given function to each element of a vector and returns a vector of the same length. Our input vector will be the geometry column of nigeria. Our function will be st_centroid(). We will be using map_dbl() variation of map from the purrr package. purrr is loaded when we load tidyverse package.
To get our longitude values we map the st_centroid() function over the geometry column of nigeria and access the longitude value through double bracket notation [[]] and 1. This allows us to get only the longitude, which is the first value in each centroid.
We do the same for latitude with one key difference. We access the second value per each centroid with [[2]]
Now that we have latitude and longitude, I used cbind to put longitude and latitude into the same object. We should check the first few observations to see if things are formatted correctly.
Return a matrix with the indices of points belonging to the set of the k nearest neighbours of each other by using knearneigh() of spdep.
Convert the knn object returned by knearneigh() into a neighbours list of class nb with a list of integer vectors containing neighbour region number ids by using knn2nb().
Return the length of neighbour relationship edges by using nbdists() of spdep. The function returns in the units of the coordinates if the coordinates are projected, in km otherwise.
Remove the list structure of the returned object by using unlist().
Min. 1st Qu. Median Mean 3rd Qu. Max.
2669 12808 20008 21804 27013 72139
The summary report shows that the largest first nearest neighbour distance is 72,139 metres, so using a number slightly larger than this (i.e. 72,200) as the upper threshold gives certainty that all regions will have at least one neighbour.
Using the code chunk below, we discover that the region with the maximum distance to its nearest neighbour is Sardauna.
Computing the fixed distance weight matrix
Now, we will compute the distance weight matrix by using dnearneigh() as shown below.
Neighbour list object:
Number of regions: 761
Number of nonzero links: 18050
Percentage nonzero weights: 3.116793
Average number of links: 23.71879
From the output, we see that the average number of links is 23.7. The number is quite high and may skew the analysis.
Next, we will use str() to display the content of wm_d72 weight matrix.
List of 761
$ : int [1:63] 2 4 9 24 54 65 67 101 120 179 ...
$ : int [1:62] 1 4 9 24 54 65 67 101 120 179 ...
$ : int [1:10] 11 19 252 257 438 445 457 628 677 682
$ : int [1:56] 1 2 54 65 102 134 135 167 182 200 ...
$ : int [1:21] 8 13 17 18 55 168 215 216 324 331 ...
$ : int [1:19] 7 14 21 174 175 212 275 276 277 289 ...
$ : int [1:32] 6 14 21 48 174 175 212 269 270 271 ...
$ : int [1:26] 5 17 18 55 65 76 101 102 215 216 ...
$ : int [1:64] 1 2 22 24 65 101 179 188 189 201 ...
$ : int [1:22] 25 26 42 67 124 155 188 189 202 330 ...
$ : int [1:11] 3 133 252 257 394 409 421 438 445 677 ...
$ : int [1:13] 30 36 37 39 92 209 314 387 428 462 ...
$ : int [1:24] 5 168 191 192 193 215 303 304 305 356 ...
$ : int [1:27] 6 7 21 31 48 50 61 81 174 175 ...
$ : int [1:37] 29 37 38 40 43 44 69 70 118 122 ...
$ : int [1:34] 27 28 34 71 170 171 176 177 180 269 ...
$ : int [1:30] 5 8 18 55 65 76 101 102 215 216 ...
$ : int [1:42] 5 8 17 24 55 65 76 101 102 179 ...
$ : int [1:7] 3 104 237 257 411 445 457
$ : int [1:9] 59 60 160 263 474 508 565 583 613
$ : int [1:31] 6 7 14 31 48 50 61 81 174 175 ...
$ : int [1:64] 9 24 51 52 53 55 57 76 77 78 ...
$ : int [1:5] 121 466 514 660 748
$ : int [1:68] 1 2 9 18 22 53 55 65 76 101 ...
$ : int [1:30] 10 26 42 67 155 188 189 202 330 364 ...
$ : int [1:24] 10 25 42 67 155 189 202 330 364 365 ...
$ : int [1:43] 16 28 34 69 70 122 170 171 176 177 ...
$ : int [1:45] 16 27 34 69 70 122 170 171 176 177 ...
$ : int [1:30] 15 37 38 39 40 43 44 173 183 184 ...
$ : int [1:13] 12 36 92 156 208 209 210 283 302 548 ...
$ : int [1:28] 14 21 48 50 61 81 175 194 205 212 ...
$ : int [1:29] 46 109 128 140 143 153 164 217 225 231 ...
$ : int [1:10] 41 102 134 135 211 369 540 546 720 744
$ : int [1:33] 16 27 28 71 170 171 176 177 180 269 ...
$ : int [1:8] 49 105 244 400 424 446 668 746
$ : int [1:21] 12 30 37 38 39 40 184 190 195 196 ...
$ : int [1:25] 12 15 29 36 38 39 40 43 184 190 ...
$ : int [1:27] 15 29 36 37 39 40 43 183 184 190 ...
$ : int [1:21] 12 29 36 37 38 40 43 184 190 209 ...
$ : int [1:22] 15 29 36 37 38 39 43 44 184 190 ...
$ : int [1:19] 33 134 135 182 200 279 280 369 488 525 ...
$ : int [1:19] 10 25 26 67 120 124 155 188 189 243 ...
$ : int [1:27] 15 29 37 38 39 40 44 69 173 184 ...
$ : int [1:27] 15 29 40 43 69 173 185 186 190 284 ...
$ : int [1:12] 117 374 381 409 415 421 430 450 509 643 ...
$ : int [1:24] 32 109 125 128 153 164 225 232 236 239 ...
$ : int [1:12] 63 64 73 111 129 259 380 399 420 472 ...
$ : int [1:30] 7 14 21 31 50 61 81 174 175 205 ...
$ : int [1:4] 35 105 401 424
$ : int [1:27] 14 21 31 48 61 81 175 205 212 278 ...
$ : int [1:47] 22 52 53 56 57 76 77 78 79 163 ...
$ : int [1:37] 22 51 53 56 57 77 78 79 163 187 ...
$ : int [1:58] 22 24 51 52 55 56 57 76 77 78 ...
$ : int [1:33] 1 2 4 67 120 155 167 182 188 206 ...
$ : int [1:51] 5 8 17 18 22 24 53 65 76 77 ...
$ : int [1:35] 51 52 53 57 77 78 79 163 187 195 ...
$ : int [1:37] 22 51 52 53 56 77 78 163 187 195 ...
$ : int [1:5] 126 127 482 687 735
$ : int [1:14] 20 60 156 263 304 305 548 550 565 576 ...
$ : int [1:11] 20 59 160 262 263 474 565 576 579 583 ...
$ : int [1:28] 14 21 31 48 50 81 175 194 205 212 ...
$ : int [1:5] 378 408 458 752 759
$ : int [1:7] 47 64 73 111 129 259 399
$ : int [1:11] 47 63 73 107 111 259 380 399 670 688 ...
$ : int [1:47] 1 2 4 8 9 17 18 24 55 101 ...
$ : int [1:26] 71 118 122 177 180 298 299 333 340 341 ...
$ : int [1:30] 1 2 10 25 26 42 54 120 155 188 ...
$ : int [1:7] 138 144 245 268 489 500 501
$ : int [1:44] 15 27 28 43 44 70 118 122 170 171 ...
$ : int [1:50] 15 27 28 69 118 122 170 171 173 176 ...
$ : int [1:20] 16 34 66 180 342 355 368 372 397 553 ...
$ : int [1:6] 355 368 371 397 652 653
$ : int [1:14] 47 63 64 107 111 114 247 259 659 670 ...
$ : int [1:15] 108 227 250 253 266 367 376 392 414 425 ...
$ : int [1:9] 249 281 419 450 461 534 634 664 738
$ : int [1:56] 8 17 18 22 24 51 53 55 77 78 ...
$ : int [1:51] 22 51 52 53 55 56 57 76 78 79 ...
$ : int [1:57] 22 51 52 53 55 56 57 76 77 79 ...
$ : int [1:39] 22 51 52 53 56 76 77 78 163 187 ...
$ : int [1:19] 97 143 225 231 239 250 264 418 440 473 ...
$ : int [1:21] 14 21 31 48 50 61 175 205 212 291 ...
$ : int [1:6] 130 253 377 406 516 754
$ : int [1:3] 146 429 679
$ : int [1:38] 99 103 128 140 143 153 154 217 233 239 ...
$ : int [1:19] 145 147 149 219 224 242 261 393 402 407 ...
$ : int [1:5] 148 479 635 687 701
$ : int [1:11] 98 105 157 255 400 454 529 661 663 668 ...
$ : int 235
$ : int [1:2] 158 265
$ : int [1:11] 93 117 384 385 386 415 477 629 643 655 ...
$ : int [1:3] 348 594 652
$ : int [1:7] 12 30 156 428 548 583 696
$ : int [1:10] 90 384 385 386 398 415 460 643 695 757
$ : int [1:17] 95 106 137 165 166 344 383 396 404 412 ...
$ : int [1:13] 94 106 112 137 145 166 383 396 412 442 ...
$ : int [1:4] 151 229 424 683
$ : int [1:18] 80 143 152 165 225 231 250 264 418 440 ...
$ : int [1:12] 87 113 222 244 248 255 267 400 446 449 ...
$ : int [1:35] 84 128 140 153 154 217 232 233 258 261 ...
[list output truncated]
- attr(*, "class")= chr "nb"
- attr(*, "region.id")= chr [1:761] "1" "2" "3" "4" ...
- attr(*, "call")= language dnearneigh(x = coords, d1 = 0, d2 = 72200)
- attr(*, "dnn")= num [1:2] 0 72200
- attr(*, "bounds")= chr [1:2] "GE" "LE"
- attr(*, "nbtype")= chr "distance"
- attr(*, "sym")= logi TRUE
We can observe that each region has varying number of neighbours.

Due to a high number of links, we have very dense graphs which make it difficult to interpret. However, we can still make some observations:
The above charts actually illustrates a characteristic of fixed distance weight matrix–more densely settled areas (usually the urban areas) tend to have more neighbours and the less densely settled areas (usually the rural counties) tend to have lesser neighbours.
Based on the above charts, we can tell that the geographical areas of the regions in Nigeria are largely varying. In the top and bottom of the charts, we see the neighbour links are very dense and less dense in the western and eastern regions (where you can see pockets of white space).
To overcome the issue of fixed distance weight matrix where there is uneven distribution of neighbours, we can use directly control the numbers of neighbours using k-nearest neighbours, as shown in the code chunk below.
As a rule-of-thumb, we will set k = 8 i.e., all regions will have 8 neighbours.
Neighbour list object:
Number of regions: 761
Number of nonzero links: 6088
Percentage nonzero weights: 1.051248
Average number of links: 8
Non-symmetric neighbours list
Plotting Adaptive Distance-based Neighbours
Selecting a spatial weight matrix is use is dependent on the geographical area of interest and the focus of the study8. In our case, between contiguity-based and distance-based spatial weight matrices, we lean towards distance-based matrices. Within distance-based matrices, we will select the adaptive distance-based spatial weight matrix for our subsequent analysis.
The reasons are summarised here:
Nigeria has 761 LGAs with varying sizes. Hence, a contiguity-based matrix will have the issue where larger LGAs have more neighbours and smaller LGAs have lesser neighbours. This would likely skew our analysis. Therefore, distance-based methods are preferred.
As mentioned earlier, the fixed distance-based method has the disadvantage that some regions would only have 1 neighbour, while on average regions have 23 neighbours. Statistical test for regions with only 1 neighbour may not be valid.
Based on the above, we will select adaptive distance-based spatial weight matrix.
After selecting the weight matrix to use, we will now assign weights to each neighboring polygon. Each neighboring polygon will be assigned equal weight (style=“W”) by assigning the fraction 1/(#of neighbors) to each neighbouring area. This is also known as a row-standardised matrix where each row in the matrix sums to 1.
Characteristics of weights list object:
Neighbour list object:
Number of regions: 761
Number of nonzero links: 6088
Percentage nonzero weights: 1.051248
Average number of links: 8
Non-symmetric neighbours list
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 761 579121 761 171.0938 3102.562
We will be using the row-standardised weight matrix for the next part of the analysis.
This in sub-section, we will use two methods: Moran’s I and Geary’s C to test the hypothesis the following hypothesis:
H0: Observed spatial patterns of values is equally likely as any other spatial pattern i.e. data is randomly disbursed, no spatial pattern
H1: Data is more spatially clustered than expected by chance alone.
We will perform Moran’s I statistical testing by using moran.test() of spdep. Moran’s I describe how features differ from the values in the study area as a whole. The Moran I statistic ranges from -1 to 1. If the Moran I is:
positive (I>0): Clustered, observations tend to be similar
negative (I<0): Disperse, observations tend to be dissimilar
approximately zero: observations arranged randomly over space
The below code chunk will perform the Moran’s I test on both functional and non-functional water points.
Moran I test under randomisation
data: nigeria$pct_functional
weights: rswm_knn8
Moran I statistic standard deviate = 30.692, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.523237285 -0.001315789 0.000292099
Moran I test under randomisation
data: nigeria$pct_nonfunctional
weights: rswm_knn8
Moran I statistic standard deviate = 26.388, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.4496039329 -0.0013157895 0.0002919996
In both cases, since the p-value < 0.05, we have sufficient statistical evidence to reject the null hypothesis at the 95% level of confidence. This means that data is more spatially clustered than expected by chance alone. Since Moran I statistics are larger than 0, the observation are clustered, observations tend to be similar.
Computing Monte Carlo Moran’s I
If we have doubts that the assumptions of Moran’s I are true (normality and randomisation), we can use a Monte Carlo simulation to perform a permutation test for Moran’s I.
The permutation tests consists of randomly reassigning the attribute values to a cell under the assumption of no spatial pattern. This random assignment is conducted n times. Each time, we will compute the Moran’s I to creating an empirical distribution of Moran’s I under H0.
The code chunk below performs permutation test for Moran’s I statistic by using moran.mc() of spdep. A total of 1000 simulation will be performed.
Monte-Carlo simulation of Moran I
data: nigeria$pct_functional
weights: rswm_knn8
number of simulations + 1: 1000
statistic = 0.52324, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Monte-Carlo simulation of Moran I
data: nigeria$pct_nonfunctional
weights: rswm_knn8
number of simulations + 1: 1000
statistic = 0.4496, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
In both cases, since the pseudo p-value is < 0.05, we have sufficient statistical evidence to reject the null hypothesis at the 95% level of confidence. This means that data is more spatially clustered than expected by chance alone for both functional and non-functional water points.
Visualising Monte Carlo Moran’s I
We can examine the simulated Moran’s I test statistics in greater detail through plotting the distribution of the statistical values as a histogram by using the code chunks below.
Let’s visualise Monte Carlo Moran’s I using a histogram. This can be created by using ggplot2 package.
df <- as.data.frame(bperm_func$res)
colnames(df) <- c("Simulated Moran's I")
moran_mc_func <- ggplot(df, aes(x=`Simulated Moran's I`)) +
geom_histogram(color = "black", fill = "grey", bins = 25) +
xlim(-0.1,0.6) +
ylab('Frequency') +
geom_vline(xintercept = 0, color = 'red') +
geom_vline(xintercept = 0.523 , color = 'blue') +
ggtitle("Histogram of Monte Carlo Simulated \nMoran's I (Functional WP)") +
theme(plot.title = element_text(size = 10, hjust = 0.5)) +
annotate("text", x = 0.35, y = 410, label = "Actual Moran's I", color = 'blue')
df <- as.data.frame(bperm_nonfunc$res)
colnames(df) <- c("Simulated Moran's I")
moran_mc_nonfunc <- ggplot(df, aes(x=`Simulated Moran's I`)) +
geom_histogram(color = "black", fill = "grey", bins = 25) +
xlim(-0.1,0.6) +
ylab('Frequency') +
geom_vline(xintercept = 0, color = 'red') +
geom_vline(xintercept = 0.450 , color = 'blue') +
ggtitle("Histogram of Monte Carlo Simulated \nMoran's I (Non-Functional WP)") +
theme(plot.title = element_text(size = 10, hjust = 0.5)) +
annotate("text", x = 0.28, y = 410, label = "Actual Moran's I", color = 'blue')
moran_mc_func + moran_mc_nonfunc
In both cases, the actual Moran’s I value (blue line) is near the extremes of the distribution of the simulated data. This suggests a statistically significant relationship and evidence of positive autocorrelation i.e. cluster9.
Geary’s C considers the difference between respective observations10--this means that it describe how features differ from their immediate neighbours. Geary’s C range from -1 to an undefined number above 1. If the Geary’s C is:
The code chunk below performs Geary’s C test for spatial autocorrelation for functional and non-functional water points using geary.test() of spdep.
Geary C test under randomisation
data: nigeria$pct_functional
weights: rswm_knn8
Geary C statistic standard deviate = 29.377, p-value < 2.2e-16
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic Expectation Variance
0.470163795 1.000000000 0.000325287
Geary C test under randomisation
data: nigeria$pct_nonfunctional
weights: rswm_knn8
Geary C statistic standard deviate = 25.29, p-value < 2.2e-16
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic Expectation Variance
0.5393883718 1.0000000000 0.0003317272
In both cases, since the p-value < 0.05, we have sufficient statistical evidence to reject the null hypothesis at the 95% level of confidence. This means that data is more spatially clustered than expected by chance alone. The Geary C statistics are less than 1 suggesting that clusters are present in both cases. This finding is consistent with the results of the Global Moran’s I test in the previous section.
Computing Monte Carlo Geary’s C
Similar to Moran’s I, we can also use Monte Carlo simulation to perform a permutation test for Geary’s C. The code chunk below performs permutation test for Geary’s C statistic by using geary.mc() of spdep. A total of 1000 simulation will be performed.
Monte-Carlo simulation of Geary C
data: nigeria$pct_functional
weights: rswm_knn8
number of simulations + 1: 1000
statistic = 0.47016, observed rank = 1, p-value = 0.001
alternative hypothesis: greater
Monte-Carlo simulation of Geary C
data: nigeria$pct_nonfunctional
weights: rswm_knn8
number of simulations + 1: 1000
statistic = 0.53939, observed rank = 1, p-value = 0.001
alternative hypothesis: greater
Since the pseudo p-value = 0.001 < 0.05, we have sufficient statistical evidence to reject the null hypothesis at the 95% level of confidence. This means that data is more spatially clustered than expected by chance alone.
Visualising the Monte Carlo Geary’s C
We can examine the simulated Geary’s C test statistics in greater detail through plotting the distribution of the statistical values as a histogram by using the code chunks below. This can be created by using ggplot2 package.
df <- as.data.frame(bperm_func$res)
colnames(df) <- c("Simulated Geary's C")
geary_mc_func <- ggplot(df, aes(x=`Simulated Geary's C`)) +
geom_histogram(color = "black", fill = "grey", bins = 25) +
xlim(0.4,1.2) +
ylab('Frequency') +
geom_vline(xintercept = 0, color = 'red') +
geom_vline(xintercept = 0.470 , color = 'blue') +
ggtitle("Histogram of Monte Carlo Simulated \nGeary's C (Non-Functional WP)") +
theme(plot.title = element_text(size = 10, hjust = 0.5)) +
annotate("text", x = 0.30, y = 410, label = "Actual Geary's C", color = 'blue')
df <- as.data.frame(bperm_nonfunc$res)
colnames(df) <- c("Simulated Geary's C")
geary_mc_nonfunc <- ggplot(df, aes(x=`Simulated Geary's C`)) +
geom_histogram(color = "black", fill = "grey", bins = 25) +
xlim(0.4,1.2) +
ylab('Frequency') +
geom_vline(xintercept = 0, color = 'red') +
geom_vline(xintercept = 0.540 , color = 'blue') +
ggtitle("Histogram of Monte Carlo Simulated \nGeary's C (Non-Functional WP)") +
theme(plot.title = element_text(size = 10, hjust = 0.5)) +
annotate("text", x = 0.37, y = 410, label = "Actual Geary's C", color = 'blue')
geary_mc_func + geary_mc_nonfunc
In both cases, the actual Geary’s C value (blue line) is near the extremes of the distribution of the simulated data. This suggests a statistically significant relationship and evidence of positive autocorrelation i.e. cluster.
Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index of autocorrelation (Moran’s I or Geary’s c) against distance.
In spatial correlograms, the number of bins determines the distance range of each bin. The range is the maximum distance divided by the number of bins11.
In the code chunk below, sp.correlogram() of spdep package is used to compute a 8-lag spatial correlogram for Moran’s I and the autocorrelation coefficient. The plot() of base Graph is then used to plot the output.


We can use the following code chunk if we are interested to find out the distance range of each bin / lag.
[1] 37386.50 73925.69 112081.23 152528.12 194617.08 238153.84 283553.45
[8] 329774.69
Next, let’s examine the full analysis report and view which values are statistically significant.
Spatial correlogram for nigeria$pct_functional
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 (761) 5.2324e-01 -1.3158e-03 2.9210e-04 30.692 < 2.2e-16
2 (761) 4.3338e-01 -1.3158e-03 1.4782e-04 35.754 < 2.2e-16
3 (761) 3.6793e-01 -1.3158e-03 1.0116e-04 36.713 < 2.2e-16
4 (761) 3.2705e-01 -1.3158e-03 7.5025e-05 37.910 < 2.2e-16
5 (761) 2.8922e-01 -1.3158e-03 5.8720e-05 37.914 < 2.2e-16
6 (761) 2.4635e-01 -1.3158e-03 4.8735e-05 35.478 < 2.2e-16
7 (761) 2.0785e-01 -1.3158e-03 4.2396e-05 32.124 < 2.2e-16
8 (761) 1.7834e-01 -1.3158e-03 3.7887e-05 29.188 < 2.2e-16
1 (761) ***
2 (761) ***
3 (761) ***
4 (761) ***
5 (761) ***
6 (761) ***
7 (761) ***
8 (761) ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Spatial correlogram for nigeria$pct_nonfunctional
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 (761) 4.4960e-01 -1.3158e-03 2.9200e-04 26.3881 < 2.2e-16
2 (761) 3.1842e-01 -1.3158e-03 1.4777e-04 26.3026 < 2.2e-16
3 (761) 2.3165e-01 -1.3158e-03 1.0112e-04 23.1671 < 2.2e-16
4 (761) 1.5268e-01 -1.3158e-03 7.5000e-05 17.7818 < 2.2e-16
5 (761) 7.5056e-02 -1.3158e-03 5.8701e-05 9.9681 < 2.2e-16
6 (761) 3.9018e-02 -1.3158e-03 4.8719e-05 5.7786 7.532e-09
7 (761) 3.1312e-02 -1.3158e-03 4.2382e-05 5.0118 5.393e-07
8 (761) 2.2785e-02 -1.3158e-03 3.7874e-05 3.9161 8.997e-05
1 (761) ***
2 (761) ***
3 (761) ***
4 (761) ***
5 (761) ***
6 (761) ***
7 (761) ***
8 (761) ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the output, we see that all results in both cases are statistically significant at the 95% level of confidence. The correlogram depicts how the spatial autocorrelation changes with distance. From the chart, we can see that Moran’s I decreases when spatial lag increases. This means that there is a quite strong spatial autocorrelation that decreases as spatial lag increases.
Comparing functional and non-functional water points, we can see that in the latter, from lag-5 onwards, the spatial autocorrelation < 0.1 and nears 0. In the case of functional water points, the Moran’s I value at lag 5 is around 0.3.
Similarly, we can do the same for Geary’s C.


Let’s use the code chunk below to print the full analysis report.
Spatial correlogram for nigeria$pct_functional
method: Geary's C
estimate expectation variance standard deviate Pr(I) two sided
1 (761) 4.7016e-01 1.0000e+00 3.2529e-04 -29.377 < 2.2e-16 ***
2 (761) 5.7163e-01 1.0000e+00 1.9203e-04 -30.912 < 2.2e-16 ***
3 (761) 6.3944e-01 1.0000e+00 1.5802e-04 -28.682 < 2.2e-16 ***
4 (761) 6.7225e-01 1.0000e+00 1.2514e-04 -29.299 < 2.2e-16 ***
5 (761) 7.1004e-01 1.0000e+00 1.0807e-04 -27.891 < 2.2e-16 ***
6 (761) 7.3684e-01 1.0000e+00 9.8181e-05 -26.558 < 2.2e-16 ***
7 (761) 7.6512e-01 1.0000e+00 9.8474e-05 -23.669 < 2.2e-16 ***
8 (761) 7.8220e-01 1.0000e+00 1.0972e-04 -20.792 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Spatial correlogram for nigeria$pct_nonfunctional
method: Geary's C
estimate expectation variance standard deviate Pr(I) two sided
1 (761) 0.53938837 1.00000000 0.00033173 -25.2897 < 2.2e-16 ***
2 (761) 0.67620379 1.00000000 0.00020058 -22.8630 < 2.2e-16 ***
3 (761) 0.74920671 1.00000000 0.00016899 -19.2922 < 2.2e-16 ***
4 (761) 0.82171931 1.00000000 0.00013480 -15.3552 < 2.2e-16 ***
5 (761) 0.89720975 1.00000000 0.00011759 -9.4791 < 2.2e-16 ***
6 (761) 0.92911305 1.00000000 0.00010771 -6.8302 8.479e-12 ***
7 (761) 0.95078872 1.00000000 0.00010928 -4.7075 2.507e-06 ***
8 (761) 0.97228216 1.00000000 0.00012356 -2.4935 0.01265 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the output, we see that all results are statistically significant at the 95% level of confidence for both cases.
The correlogram depicts how the spatial autocorrelation changes with distance. From the chart, we can see that Geary’s C increases when spatial lag increases. This is unsurprising, given that Moran’s I and Geary’s C are inversely related. Also, all Geary C’s values are < 1 which suggests clustering.
In the case of non-functional water points, from Lag 5, the values are >= 0.9, nearing 1 which suggests little to no cluster or dissimilar observations. On the other hand, the values for functional water points are <= 0.8 for all lags.
In the previous section, we have established through statistical testing that spatial clustering of non-functional water points occur in Nigeria. Now, we seek to detect clusters or outliers and discover if there are any hot or cold spots of non-functional water points using Local Spatial Autocorrelation Statistics. They include: Anselin’s Moran Scatterplot and Local Indicators of Spatial Autocorrelation (LISA) and Getis-Ord Gi Statistics.
The Local Moran’s I is a local spatial autocorrelation statistic based on the Moran’s I statistic. It was developed by Anselin (1995) as a LISA statistic. Anselin defines LISA to have two properties:
“The LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation”; and
“the sum of LISAs for all observations is proportional to a global indicator of spatial association.”
The first point means that given an attribute of interest, for each region in Nigeria, the LISA indicates existence and extent of spatial clustering of regions with similar attribute values12.
Positive Local Moran’s I value indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. Negative Local Moran’s I value indicates that a feature has neighboring features with dissimilar values; this feature is an outlier13.
Computing Local Moran’s I
Before we can map the values, we first need to compute them using the localmoran() function of spdep package. It computes Ii values, given a set of zi values and a listw object providing neighbour weighting information for the polygon associated with the zi values.
The code chunk below is used to compute the Local Moran’s I of percentage of functional and non-functional water points at the LGA level.
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 0.37995419 -2.159064e-04 0.020344291 2.6653649 0.007690482
2 0.38571528 -2.312021e-04 0.021785225 2.6148457 0.008926777
3 -0.07290031 -2.549836e-04 0.024025489 -0.4686747 0.639302139
4 0.05542823 -1.839603e-05 0.001733752 1.3316242 0.182983719
5 0.47953038 -5.726499e-04 0.053940028 2.0671827 0.038716939
6 0.12006509 -3.150906e-05 0.002969563 2.2038623 0.027534029
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 0.6506454 -0.0008294518 0.07810908 2.331025 0.019752051
2 0.5944165 -0.0005017373 0.04726386 2.736483 0.006209989
3 0.1855012 -0.0016465926 0.15493203 0.475460 0.634459103
4 0.8661740 -0.0007449612 0.07015857 3.272941 0.001064347
5 0.6411397 -0.0010566011 0.09947700 2.036134 0.041736858
6 -0.1639149 -0.0001770915 0.01668751 -1.267515 0.204971134
localmoran() function returns a matrix of values whose columns are:
Ii: the local Moran’s I statistics
E.Ii: the expectation of local moran statistic under the randomisation hypothesis
Var.Ii: the variance of local moran statistic under the randomisation hypothesis
Z.Ii:the standard deviate of local moran statistic
Pr(): the p-value of local moran statistic
The code chunk below list the content of the local Moran matrix derived by using printCoefmat().
Mapping Local Moran’s I values and p-values
Next, we will append the local Moran’s I dataframe (i.e. localMI_func and localMI_nonfunc) onto nigeria SpatialPolygonDataFrame in preparation for the next part. This can be done using the code chunks below.
We must also consider the p-values for each of the Local Moran’s I values before interpreting them. Let’s visualise both the Local Moran’s I values and its p-values using the choropleth mapping functions of tmap package
localMI_func.map <- tm_shape(nigeria.localMI_func) +
tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran I Statistics") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Local Moran's I Map \n(Functional WP)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
pvalue_func.map <- tm_shape(nigeria.localMI_func) +
tm_fill(col = "Pr.Ii",
breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette = "-Blues",
title = "Local Moran's I p-values") +
tm_borders(alpha = 0.3)+
tm_layout(main.title = "Local Moran's I p-values Map \n(Functional WP)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(localMI_func.map, pvalue_func.map, asp = 1, ncol = 2)
localMI_nonfunc.map <- tm_shape(nigeria.localMI_nonfunc) +
tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran I Statistics") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Local Moran's I Map \n(Non-Functional WP)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
pvalue_nonfunc.map <- tm_shape(nigeria.localMI_nonfunc) +
tm_fill(col = "Pr.Ii",
breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette = "-Blues",
title = "Local Moran's I p-values") +
tm_borders(alpha = 0.3)+
tm_layout(main.title = "Local Moran's I p-values Map \n(Non-Functional WP)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(localMI_nonfunc.map, pvalue_nonfunc.map, asp = 1, ncol = 2)
Something extra that we can also do is to build a choropleth and shade only regions that are statistically significant. This can be done using the code chunk below. We first create a new object nigeria.localMI_sig_f that consists of statistically significant values for functional water points. Then we plot a base map that consists of just the polygons features. Lastly, we overlay the base map with a new map that consists of the statistically significant local Moran’s I value. We repeat this for non-functional water points.
#Functional
nigeria.localMI_sig_f <- cbind(nigeria,localMI_func) %>%
rename(Pr.Ii = Pr.z....E.Ii..) %>%
filter(Pr.Ii < 0.05)
base <- tm_shape(nigeria) +
tm_fill(col = 'gray98') +
tm_borders(alpha = 0.3)
localMI_sig_f.map <- base +
tm_shape(nigeria.localMI_sig_f) +
tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran I Statistics") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Local Moran's I (Sig.) Map \n(Functional WP) ",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
#Non-Functional
nigeria.localMI_sig_nf <- cbind(nigeria,localMI_nonfunc) %>%
rename(Pr.Ii = Pr.z....E.Ii..) %>%
filter(Pr.Ii < 0.05)
base <- tm_shape(nigeria) +
tm_fill(col = 'gray98') +
tm_borders(alpha = 0.3)
localMI_sig_nf.map <- base +
tm_shape(nigeria.localMI_sig_nf) +
tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran I Statistics") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Local Moran's I (Sig.) Map \n(Non-Functional WP) ",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(localMI_sig_f.map,localMI_sig_nf.map)
Recall that a positive Local Moran’s I value indicates that a feature is part of a cluster and a negative Local Moran’s I value indicates that a feature is an outlier.
For the functional water points map, we see most regions are in shades of green which corresponds to them being in cluster(s). This is similar for the non-functional water points map. There are areas of overlap on both maps (top) and also areas where both maps differ (bottom of the map). Later on, we will combine this analysis with LISA cluster map to derive more information.
The Anselin’s Moran Scatterplot allows us to assess how similar or dissimilar an observed value is to it’s neighbouring observations. It is a visualisation tool that gives us a visual representation of spatial associations in the neighbourhood around each observation14.
Standardising Variable
Before we plot the scatterplot, we can also standardise the variable first. This will result in the scatterplot to be centered on the coordinates (0,0) which may be easier to interpret.
We can use scale() to center and scale the variable. Centering is done by subtracting the mean from the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations.
After running the code chunk below, a new column Z.pct_nonfunctional will be created to store the standardised values. The as.vector() added to the end is to make sure that the data type we get out of this is a vector, that map neatly into out dataframe.
Simple feature collection with 3 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 481557.7 ymin: 116934.8 xmax: 552734.5 ymax: 579396
Projected CRS: Minna / Nigeria West Belt
LGA Level shapeID shapeGroup shapeType
1 Aba North ADM2 NGA-ADM2-72505758B79815894 NGA ADM2
2 Aba South ADM2 NGA-ADM2-72505758B67905963 NGA ADM2
3 Abaji ADM2 NGA-ADM2-72505758B61968000 NGA ADM2
geometry wpt_functional wpt_nonfunctional wpt_unknown
1 MULTIPOLYGON (((552560.3 12... 7 9 1
2 MULTIPOLYGON (((545153.6 12... 29 35 7
3 MULTIPOLYGON (((510602.3 57... 23 34 0
wpt_total pct_functional pct_nonfunctional Area wpt_func_density
1 17 0.4117647 0.5294118 18723031 [m^2] 0.37387109 [1/m^2]
2 71 0.4084507 0.4929577 43383884 [m^2] 0.66845099 [1/m^2]
3 57 0.4035088 0.5964912 832968699 [m^2] 0.02761208 [1/m^2]
wpt_nonfunc_density Z.pct_functional Z.pct_nonfunctional
1 0.48069141 [1/m^2] -0.4048126 0.7934450
2 0.80675119 [1/m^2] -0.4189065 0.6171056
3 0.04081786 [1/m^2] -0.4399237 1.1179293
In the output above, we can see the two new columns added.
Now, we can plot the Moran scatterplot using the standardised values and moran.plot() of spdep.


In both cases, we will find that the output of the Moran Scatterplots to be consistent with the Moran’s I scores tabulated in the previous sub-section. The Moran scatterplot has 4 quadrants:
“High-High” - Top right corner : Positive Autocorrelation Cluster. Regions here have high percentage of non-functional water points and are surrounded by other areas that have the higher than average level of non-functional water points.
“Low-High” - Top left corner: Negative Autocorrelation Cluster. Regions here have low percentage of non-functional water points and are surrounded by other areas that have the higher than average level of non-functional water points.
“High-Low” - Bottom right corner: Negative Autocorrelation Cluster. Regions here have high percentage of non-functional water points and are surrounded by other areas that have the lower than average level of non-functional water points.
“Low-Low” - Bottom left corner : Positive Autocorrelation Cluster. Regions here have low percentage of non-functional water points and are surrounded by other areas that have the lower than average level of non-functional water points.
Unfortunately, the Moran scatterplot does not tells us which points/regions are significant. To overcome this, we can use the LISA Cluster Maps.
LISA Cluster Maps also categorises each region into one of five groups: (1) High-High, (2) High-Low, (3) Low-High, (4) Low-Low and (5) Insignificant.
We can prepare a LISA cluster map by following these steps:
DV by using a by using the spatially lagged version (lag_GDPPC) of the variable of interested (GDPPC) and center it around its means. When DV > 0, the spatially lagged variable of the region is higher than the mean.L_MI using the Local Moran’s I.#Step 1
quadrant <- vector(mode = 'numeric', length = nrow(localMI_func))
#Step 2
nigeria$lag_pct_func <- lag.listw(rswm_knn8, nigeria$pct_functional)
DV_func <- nigeria$lag_pct_func - mean(nigeria$lag_pct_func)
#Step 3
LM_I_func <- localMI_func[,1]
#Step 4
signif <- 0.05
#Step 5
quadrant[DV_func <0 & LM_I_func>0] <- 1 #low-low
quadrant[DV_func >0 & LM_I_func<0] <- 2 #high-low
quadrant[DV_func <0 & LM_I_func<0] <- 3 #low-high
quadrant[DV_func >0 & LM_I_func>0] <- 4 #high-high
#Step 6
quadrant[localMI_func[,5]>signif] <- 0This is the code chunk to prepare the LISA cluster map.
#Assign each region to its respective quardrant
nigeria.localMI_func$quadrant <- quadrant
#Set the colours--one for each quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
LISAmap_func <- tm_shape(nigeria.localMI_func) +
tm_fill(col = "quadrant",
style = "cat",
palette = colors[c(sort(unique(quadrant)))+1],
labels = clusters[c(sort(unique(quadrant)))+1]) +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "LISA Cluster Map \n (Functional WP) ",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)We will repeat the same steps for non-functional water points.
#Step 1
quadrant <- vector(mode = 'numeric', length = nrow(localMI_nonfunc))
#Step 2
nigeria$lag_pct_nonfunc <- lag.listw(rswm_knn8, nigeria$pct_nonfunctional)
DV_nonfunc <- nigeria$lag_pct_nonfunc - mean(nigeria$lag_pct_nonfunc)
#Step 3
LM_I_nonfunc <- localMI_nonfunc[,1]
#Step 4
signif <- 0.05
#Step 5
quadrant[DV_nonfunc <0 & LM_I_nonfunc>0] <- 1 #low-low
quadrant[DV_nonfunc >0 & LM_I_nonfunc<0] <- 2 #high-low
quadrant[DV_nonfunc <0 & LM_I_nonfunc<0] <- 3 #low-high
quadrant[DV_nonfunc >0 & LM_I_nonfunc>0] <- 4 #high-high
#Step 6
quadrant[localMI_nonfunc[,5]>signif] <- 0This is the code chunk to prepare the LISA cluster map.
#Assign each region to its respective quardrant
nigeria.localMI_nonfunc$quadrant <- quadrant
#Set the colours--one for each quadrant
colors <- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
clusters <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
LISAmap_nonfunc <- tm_shape(nigeria.localMI_nonfunc) +
tm_fill(col = "quadrant",
style = "cat",
palette = colors[c(sort(unique(quadrant)))+1],
labels = clusters[c(sort(unique(quadrant)))+1]) +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "LISA Cluster Map \n (Non-Functional WP) ",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)We can plot the local Moran’s I map (statistically significant values only) and LISA map together. What we will observe is that the shaded regions are the same for each pair of maps.

The LISA maps provides another level of information–which is whether the region have relatively higher or lower percentage of non-functional water points.
Functional: From the Local Moran’s I map, we can make out that regions have positive Local Moran’s I values, suggesting that they have neighbours with similarly high or low percentage of functional water points. This would be consistent with the dark blue (low-low) and red (high-high) regions in the LISA map. We can see that the top areas are in red while bottom are in blue.
Non-functional: From the Local Moran’s I map, we can make out that regions have positive Local Moran’s I values, suggesting that they have neighbours with similarly high or low percentage of non-functional water points. This would be consistent with the dark blue (low-low) and red (high-high) regions in the LISA map. We can see that the top areas are in blue while bottom are in red.
Overall, Local Moran’s have revealed significant spatial cluster and outliers.
Functional Water points
For ease of comparison, here is the map of the percentage of functional water points and the LISA map. I have set the mode to allow for interactive viewing - this means that we can zoom in / out of the map and click on the LGAs too.
At a glance, in the LISA map, we can see two larger clusters–a high-high (red) cluster situation near the top and a low-low (blue) cluster situation near the bottom of the map. We also notice no significant spatial clusters or outliers in the middle/central area of the map. It may be geographically related. The top area borders nearby countries like Niger and Chad while the bottom are coastal areas bordering the Gulf of Guinea.
There are four high-high clusters areas. The regions in this area and their neighbours have a high percentage of functional water points The largest one is at the northern portion of the map. In that area, there are pockets of low-high outliers (light blue)--meaning that these areas have a lower percentage of functional water points as compared to their neighbours and may be worth investigating. There is another high-high cluster on the top right. The LGAs sandwiched between the two high-high clusters may also be worth a deeper look.
There are around two larger low-low clusters (dark blue). The largest one is in the southern part of the country. There are also pockets of high-low outliers in the same parts. Interestingly, the cluster extends from the coastal areas and move towards the inner parts of the country in a C shape–the centre of the ‘C’ are not identified as cluster or outliers.
Non-Functional Water Points
Similarly, for ease of comparison, here is the map of the percentage of non-functional water points and the LISA map.
We can see all categories of cluster/outlier classification on the LISA map. The first two observations we can make are that there are two large clusters–one low-low cluster situation near the top and a high-high cluster situation near the bottom of the map. We also notice no significant spatial clusters or outliers in the middle/central area of the map. It may be geographically related. The top area borders nearby countries like Niger and Chad while the bottom are coastal areas bordering the Gulf of Guinea.
There are around 6 low-low clusters (dark blue). The regions in this area and their neighbours have a low percentage of non-functional water points. The larger low-low cluster is relatively extensive. It extends from the top middle of the map to the top right corner and consists parts of states like Borno, Yobe, Dutse and Kano.
There are four high-high clusters (red). The regions in this area and their neighbours have a high percentage of non-functional water points The largest one extends from bottom, slightly-off centre of the map upwards. This includes LGAs in the states of Edo, Delta and Baylesa. Towards the right, there is another high-high cluster in the state of Cross River. The first and second clusters are relatively close to each other as compared to the remaining two. An interesting area to further study is Kwame and Fumakaye LGAs. They are high-high clusters situated near a large low-low cluster.
We see pockets of outliers throughout the country. There are two low-high outliers that are situated next to clusters. The regions have low percentage of non-functional water points while their neighbours have higher percentage of the same. We can see two outliers Bagudo and Fakai near the top left area of the map. The former is situated next to a high-high cluster. There is no low-high clusters towards the top and top right of the map, which is probably because the area has a large low-low spatial cluster. On the other hand, there are pockets of high-low outliers like Tudun Wada and Karaye LGAs.

Comparing both maps, we can see a different in high-high and low-low clusters, it’s almost always inverse. Also, both maps have the similarity where the centre part of the map has no clusters or outliers identified.
In the sub-section, we will use a local spatial autocorrelation statistic to detect hot or cold spots. Hot spot refers to areas that have higher values relative to its surroundings.
The Getis and Ord’s G-statistics (Getis and Ord, 1972; Ord and Getis, 1995) are alternative spatial statistics to detect spatial anomalies. It looks at each region within the context of its neighbouring features. A statistically significant hot spot will have high values and are surrounded by other areas with high values as well.
The Gi statistic is a z-score. For statistically significant and positive z-score, the higher the value, the more intense the clustering of high values is i.e. hot spot. Inversely, for a statistically significant negative z-score, the lower the value, the more intense the clustering of lower values is i.e. cold spot.
With reference to the context, we are hence looking for areas where the area itself and its neighbours have a high percentage of non-functional water points (hot spots) and areas where the areas itself and its neighbours have a lower percentage of non-functional water points.
The analysis consists of three steps:
Deriving spatial weight matrix
Computing Gi statistics
Mapping Gi statistic
Deriving spatial weights matrix
Getis-Ord defines neighbours based on distance. The Gi-statistics measures the degree of association that comes about from the concentration of weighted points and all other weighted points included within a radius of distance from the original point16. It included a weight component, which is a binary spatial weight matrix–where 1 represents a link.

Therefore, we can use the previously derived knn8, and weight it with a binary weight matrix. This can be done using the code chunk here.
Characteristics of weights list object:
Neighbour list object:
Number of regions: 761
Number of nonzero links: 6088
Percentage nonzero weights: 1.051248
Average number of links: 8
Non-symmetric neighbours list
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 761 579121 6088 10950 198564
Computing Gi Statistics
To compute the Gi statistics, we can use localG() from spdep package.
Mapping GI Values with Adaptive Distance Weights
To visualise the locations of hot spot and cold spot areas, we can use the choropleth mapping functions of tmap package.
The below code chunk plots the map for functional water points.
tmap_mode("plot")
Gimap_func <- tm_shape(nigeria.gi_func) +
tm_fill(col = "gstat_adaptive",
style = "pretty",
palette = "-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Gi Map \n (Functional WP)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(choropleth_plot(nigeria, "pct_functional", "pretty", "Proportion of Functional water points \nin Nigeria: partitioned by 'pretty' intervals"),
Gimap_func) 
The below code chunk plots the map for non-functional water points
tmap_mode("plot")
Gimap_nonfunc <- tm_shape(nigeria.gi_nonfunc) +
tm_fill(col = "gstat_adaptive",
style = "pretty",
palette = "-RdBu",
title = "local Gi") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Gi Map \n (Non-Functional WP) ",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(choropleth_plot(nigeria, "pct_nonfunctional", "pretty", "Proportion of Non-functional water points \nin Nigeria: partitioned by 'pretty' intervals"),
Gimap_nonfunc) 
Before we interpret the charts, we should only consider the statistically significant spots. We can refer to the localg()documentation for the critical values of the statistic for the 95th percentile. Since our data has close to 1,000 records, the critical value is +/-3.886. This means that the Gi Statistic is statistically significant if it is more than or less than 3.886.
We can use the code chunk below to filter out only the statistically significant spots.
nigeria.gi_sig_f <- cbind(nigeria, as.matrix(gi.adaptive_func)) %>%
rename(gstat_adaptive = as.matrix.gi.adaptive_func.) %>%
filter(gstat_adaptive > 3.886 | gstat_adaptive < -3.886 )
nigeria.gi_sig_nf <- cbind(nigeria, as.matrix(gi.adaptive_nonfunc)) %>%
rename(gstat_adaptive = as.matrix.gi.adaptive_nonfunc.) %>%
filter(gstat_adaptive > 3.886 | gstat_adaptive < -3.886 )To visualise the map with statistically significant Gi statistics, we can refer to the code chunk below. I have set the mode to allow for interactive viewing - this means that we can zoom in / out of the map and click on the LGAs too.
Functional Water Points
tmap_mode("view")
base <- tm_shape(nigeria) +
tm_fill(col = 'gray98') +
tm_borders(alpha = 0.3)
Gimap_sig_f <- base +
tm_shape(nigeria.gi_sig_f) +
tm_fill(col = "gstat_adaptive",
style = "pretty",
palette = "-RdBu",
title = "Local Gi") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Local Gi Statistic (Sig.) Map \n (Functional WP)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(choropleth_plot(nigeria, "pct_functional", "quantile", "Proportion of Functional water points \nin Nigeria: partitioned by 'pretty' intervals"),
Gimap_sig_f) The Gi statistics have revealed significant pockets of hot (red shades) spot areas and no cold spots. Right away, we observe that these areas are situated mostly in the northern regions. The identified of hot spots are less sparse than what we saw in the LISA map.
Looking at the hot spots, we see a relatively large hot spot areas at the top of the map. There are areas with higher intensity of functional water points, identified by the darker red shades and areas with less intensity in lighter red shades. This area borders the neighbouring country of Niger. It does seem that the intensity is higher in the outer areas before becoming less intense.
Non-functional Water Points
tmap_mode("view")
base <- tm_shape(st_transform(nigeria)) +
tm_fill(col = 'gray98') +
tm_borders(alpha = 0.3)
Gimap_sig_nf <- base +
tm_shape(nigeria.gi_sig_nf) +
tm_fill(col = "gstat_adaptive",
style = "pretty",
palette = "-RdBu",
title = "Local Gi") +
tm_borders(alpha = 0.3) +
tm_layout(main.title = "Local Gi Statistic (Sig.) Map \n (Non-functional WP)",
main.title.size = 1,
main.title.position = "center",
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE)
tmap_arrange(choropleth_plot(nigeria, "pct_nonfunctional", "quantile", "Proportion of Non-functional water points \nin Nigeria: partitioned by 'pretty' intervals"),
Gimap_sig_nf) The Gi statistics have revealed significant pockets of hot (red shades) and cold (blue shades) spot areas. At first glance, we can see that there are mostly cold spots on the top half of the map and mostly hot spots on the bottom of the map. This pattern is consistent with the LISA map, although the extent of hot/cold areas are different.
On the top right area, we can see a cold spot. It includes Bama, Dikwa, Mafa and Monguno with more intense clustering (darker shade of blue). They are situated towards the outer areas / near borders. The area beyond the dark blue shaded region are LGAs without water points at all (no data recorded). This may signify the importance of these water points and probably and importance to ensure they are all functional. The lower percentage of non-functional water points could also be because of the lower number of water points in the area, which increase the importance of existing water points. Towards the left, we have Gumel and Maigatari, Miga and Jahun that are also cold spots.
Looking at the hot spots, we see some at the bottom right of the map. At the coastal border, we have more intense clustering (dark red), suggesting higher number of non-functional water points in that area. Towards the mainland, the clustering is less intense but with pockets of dark reds.
In our study of the functional and non-functional water points in Nigeria using ESDA, we utilised exploratory spatial data analysis methods and visualisation tools to generate insights about the situation–more that what we could have found out using only traditional choropleths. Using Global Spatial Autocorrelation Statistics, we found out that there is existence of spatial autocorrelation in both cases of functional and non-functional water points. This led to us using local spatial autocorrelation statistical methods like Local Moran’s, Anselin Moran Scatterplot, LISA cluster maps and Getis and Ord’s G-Statistics to under areas of clusters or outliers and hot or cold spots.
Our cluster and outlier analysis on functional water points found clusters in the northern part of the map to have higher proportion of functional water points and in the southern part of the map to have less proportion of the same. The same analysis on non-functional water point found clusters on the northern part of Nigeria to have less proportion of non-functional water points and the clusters in the southern part to have higher proportion of the same.
Our hot and cold spot analysis of functional water points generated one large hot spot at the northern region of the map. The same analysis on non-functional water points found areas of hot spot (higher proportion of non-functional water points) towards the southern portion of the map and some cold spots towards the northern parts.
The two analyses conducted using different statistical methods yield different results. Visually we can observe that the hot/cold spots identified in Gi Map were less sparse as compared to the LISA clusters map. A common observation is that in both, the middle / central parts of Nigeria were not identified as clusters, the northern part is identified as a cold spot and cluster and the southern part, a hot spot and cluster for non-functional water points.
Overall, we can see that the southern areas have higher proportion of non-functional water points and northern areas have higher proportion of functional water points. This may gives valuable insights to inform public policy and improve the water situation.
We can explore the impact of living near the borders–either to the neighbouring countries or coastal areas. It would also be interesting to deep dive into location that are near rivers like the Niger or Benou.
It would also be interesting to bring in the water sources e.g., tap, well, etc., into the dataset and see if is any relationship between type of water source and its functional status. This applies for other demographic factors like population size, income level and education level.
https://www.worldbank.org/en/topic/water/overview↩︎
https://www.unwater.org/publications/un-world-water-development-report-2022↩︎
https://www.waterpointdata.org/about/↩︎
Okorie PN, Ademowo GO, Saka Y, Davies E, Okoronkwo C, Bockarie MJ, Molyneux DH, Kelly-Hope LA. Lymphatic filariasis in Nigeria; micro-stratification overlap mapping (MOM) as a prerequisite for cost-effective resource utilization in control and surveillance. PLoS Negl Trop Dis. 2013 Sep 5;7(9):e2416. doi: 10.1371/journal.pntd.0002416. PMID: 24040432; PMCID: PMC3764235.↩︎
https://www.unicef.org/nigeria/press-releases/nearly-one-third-nigerian-children-do-not-have-enough-water-meet-their-daily-needs↩︎
Mack, Vincent & Kam, Tin Seong. (2018). Is There Space for Violence?: A Data-driven Approach to the Exploration of Spatial-Temporal Dimensions of Conflict. 1-10. 10.1145/3282933.3282935.↩︎
https://geodacenter.github.io/workbook/4a_contig_weights/lab4a.html#rook-contiguity)↩︎
Chapter 2. Codifying the neighbourhood structure of Handbook of Spatial Analysis: Theory and Application with R.↩︎
https://swampthingecology.org/blog/geospatial-data-analysis-in-rstats.-part-2/↩︎
Chapter 2. Codifying the neighbourhood structure of Handbook of Spatial Analysis: Theory and Application with R.↩︎
https://geodacenter.github.io/workbook/5a_global_auto/lab5a.html#spatial-correlogram↩︎
http://ceadserv1.nku.edu/longa/geomed/ppa/doc/LocalI/LocalI.htm#:~:text=Local%20Moran’s%20I%20is%20a,spatial%20association%20or%20LISA%20statisti↩︎
https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/h-how-cluster-and-outlier-analysis-anselin-local-m.htm↩︎
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_variogram_details31.htm↩︎
Salubi, E.A., Elliott, S.J. Geospatial analysis of cholera patterns in Nigeria: findings from a cross-sectional study. BMC Infect Dis 21, 202 (2021). https://doi.org/10.1186/s12879-021-05894-2↩︎
Getis, A., & Ord, K. (1992). “The Analysis of Spatial Association by Use of Distance Statistics”. Geographical Analysis, 24, 189–206↩︎